Introduction to Spatial Analysis
Day 1 - Concepts and Datasets
Jonathan Phillips
January, 2019
Geography
- What is your favourite sport?
- Do you speak Spanish?
- Do you know who Fofão is?
- How many kisses on the cheek do you greet someone with?
- If you are on your own in a taxi do you sit in the front or back?
- Do you think government policy should allow free migration?
- Where do you live?
Geography
Knowledge and communication depend on where we live
Social norms and customs depend on where we live
Political preferences depend on where we live
Geography
Tobler’s First Law of Geography:
“Everything is related to everything else, but near things are more related than distant things”
Geography
What does ‘near’ mean?
- Concepts of distance:
- Euclidean
- Great Circle
- Manhattan
- Levensthein
- Mahalanobis
- Driving
- Network
- Minimum-cost
- Genetics
Geography
- What does ‘related’ mean?
- Correlated
- More similar
- More different (ex. dialing codes to avoid typing errors)
- ‘Related’ does not mean one person ‘causes’ a similar effect on another
- It may just be a common response to a similar environment
- But interactions and spillovers are common
Geography
- Locations of ‘Events’ could be ‘near’ to each other

Geography
- Or characteristics of locations could be ‘near’ to each other

Geography
- Multiple characteristics could also be ‘near’ to each other

Geography
- But isn’t the world getting smaller?
- ‘The death of distance’
- Everything is ‘near’ on the internet
- Relevant distances may be changing
- Cost of flights instead of kilometres or hours
- Language and social network instead of proximity to radio tower
- Spatial relationships take place at multiple scales
- I am Welsh, British, European etc.
- The similarities between rural China and rural Russia are greater than the differences
Geography
- Lots of interesting questions are really non-spatial
- We can draw maps of them
- But the conclusion does not depend on the locations of the units
| Which state in Brazil is richest? (DF) |
Where in Brazil are states richest? (Southeast) |
| How many countries have had cases of ebola? (11) |
Which part of Africa was affected by ebola (West and Central)? |
| What is the population of the USA? (~325m) |
How many people live West of the Mississippi? (~136m) |
Geography
- Physical features also affect social and political processes
- Attracting economic activity
- Preventing interactions
Geography

Geography

Geography

Merits of Spatial Analysis
Opportunities:
- Deeper explanations for common outcomes
- Where helps us understand why
- Avoid confounding relationships
- Enabling new inferential methodologies
Limitations:
- Data are not ‘independent’ for statistical analysis
- Data are often aggregated, and the level of aggregation affects our conclusions (Modifiable Areal Unit Problem, Ecological Fallacy)
- Distances of complex shapes are not ‘fixed’ (fractals)
Merits of Spatial Analysis

Map Literacy
- Maps are clear and convincing
- Patterns may only be visible when arranged spatially
- If you have spatial data, why put it in a table or a chart?
Map Literacy
% latex table generated in R 3.5.1 by xtable 1.8-2 package % Sun Jan 13 12:51:21 2019
Map Literacy

Map Literacy

Map Literacy
- But maps still require careful interpretation
Map Literacy
- Scale
- Can I walk from The Art Institute of Chicago to Union Station in 10 minutes?

Map Literacy
- Scale
- Can I walk from The Art Institute of Chicago to Union Station in 10 minutes?

Map Literacy
- Compass
- What’s the best place to view the sunset in the Wirral (UK)?

Map Literacy
- Compass
- What’s the best place to view the sunset in the Wirral (UK)?

Map Literacy
- Legend
- Can be manipulated to convey relevant (or misleading!) conclusions

Map Literacy
- Choosing the Indicator
- The most important! What precisley do we want to convey?

Map Literacy
- Choosing the Indicator
- The most important! What precisley do we want to convey?

Map Literacy
- Mapping values to colours

Map Literacy
- Mapping values to colours
- Hard: Chosing break points between categories
- Equal intervals, quantiles, standard deviations, ‘natural’ breaks

Map Literacy
- Mapping values to colours
- Hard: Chosing break points between categories
- Equal intervals, quantiles, standard deviations, ‘natural’ breaks

Map Literacy
- Mapping values to colours

Map Literacy
- Mapping values to colours

Map Literacy
- Mapping values to colours

Vector vs. Raster Data
- Vector
- Start with a blank page
- Add specific objects (points, lines, polygons) defined by coordinates (x,y)
- The computer stores just the coordinates of the objects
- Non-spatial ‘Attributes’ of each object allow complex analyses
- Raster
- Start with a grid
- Each grid square (pixel) has a value
- The computer stores one value for every grid square (fixed memory size)
- Mostly for ‘continuous’ remote sensing (satellite) images
Vector vs. Raster Data

Types of Vector Data
- An analysis choice, and depends on scale
Types of Vector Data
- The attributes we assign to vector objects also vary

Locations in Space
- Longitude = Angle from equator (N/S)
- Latitude = Angle from Greenwich, London (E/W)

Locations in Space
- Longitude & Latitude can be measured in different units
- DMS: 49°30’00″N, 123°30’00″W
- DM: 49°30.0′, -123°30.0’
- Decimal Degrees: 49.5000°,-123.5000°
- But all of these use the same Geographic Coordinate System
- And we ‘always’ use the same one
- WGS-84
Locations in Space

- This oblate spheroid is estimated by a ‘datum’ so we get the location correct
- No need to worry about this, WGS-84 includes its own datum
Locations in Space
- But we view maps on flat surfaces: paper or screens
- To produce flat maps we need a Projected Coordinate Reference System
- Translating 3-D locations to 2-D locations
- There are many different ways to do this, just as there are many ways to peel an orange
Locations in Space
- Projections can preserve shape, area or distance, but not all three!

Locations in Space

Locations in Space
Coordinate Reference Systems have useful shortcut EPSG codes - In R, this is all you need
| WGS-84 |
Geographic |
4326 |
| Corrego Alegre / UTM zone 23S (Coastal Brazil) |
Projected |
22523 |
| Chua / UTM zone 23S (Distrito Federal) |
Projected |
4071 |
Locations in Space
- Which Coordinate Reference System (CRS) should I use?
- Important: You don’t choose - your data sources already come with a specific CRS
- Important: ALL data in the analysis must use the same CRS
- That means sometimes we have to transform from one coordinate system to another
- For projections, do you want to convey shape, area or distance accurately?
- For distance, what units do you want to use?
- Geographic: Degrees
- Projected: Meters (usually)
Georeferencing
- With a CRS, computers understand locations such as -23.562778, -46.725261
- But what if we have a street address?
Spatial Datasets

Spatial Datasets
- Vector Spatial Datasets
- Coordinates for every object
- Multiple coordinates for lines, polygons
| 001 |
Minas Gerais |
-48.77246, -17.773988 |
| 002 |
Rio de Janeiro |
-49.24686, -16.819800 |
Spatial Datasets
- Vector Spatial Datasets
- Coordinates for every object
- Multiple coordinates for lines, polygons
| 001 |
Minas Gerais |
MULTIPOLYGON ((( -48.77246 -17.773988, -48.77252 -17.773970, -48.77266 -17.773990))) |
| 002 |
Rio de Janeiro |
MULTIPOLYGON ((( -49.24686 -16.819800, -49.24701 -16.819812, -49.24707 -16.819838))) |
Spatial Datasets
- One single ‘Multipolygon’ can be complicated
- Comprised of many distinct polygons
- Polygons can have ‘holes’ in them

Spatial Datasets
- Raster Spatial Datasets
- Coordinates for every data point
| -106.05 |
35.96 |
0 |
| -106.06 |
35.96 |
13 |
| -105.07 |
35.96 |
2 |
| -105.08 |
35.96 |
0 |
Spatial Datasets
- Historically, vector data has been stored as shapefiles
- Shapefiles separate out the tables, location data, projection into separate files
| Data.shp |
Geometry details |
| Data.dbf |
Non-spatial attribute data (a table) |
| Data.shx |
Indexing of the geometry to match the table |
| Data.prj |
Details of the projection |
Spatial Datasets
- Raster data is typically stored as .tiff files
- The same as you get from a camera or scanner
- But with location and projection data so that we know ‘where’ the image corresponds to
Non-Spatial Joins
- Most of our data is non-spatial, but could be made spatial
- Election results
- Death rates
- Welfare payments
- Conflict
- We can make this data spatial if we link it to existing spatial (location) data
- Using common identifiers in both datasets
- Non-spatial joins
Non-Spatial Joins
- Governments publish school performance data
- But what is the spatial pattern of school performance?
- Better in the city centre or in the suburbs?
- We need a source for the location of the schools
- Perhaps from a separate geographical survey
- Or by georeferencing their addresses
- How do we combine the school performance and location datasets?
Non-Spatial Joins

temp
- examples of types of spatial analysis